Image Titles - Variations on Show, Attend and Tell
نویسندگان
چکیده
Inspired by recent advances in machine translation and object detection, we implement an image captioning pipeline, consisting of a Fully Convolutional Neural Network piping image features into an image-captioning LSTM, based on the popular Show, Attend, and Tell model. We implement the model in TensorFlow and recreate performance metrics reported in the paper. We identify and experiment with variations on the model, and evaluate them via a series of experiments on the MS COCO benchmark dataset.
منابع مشابه
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Inspired by recent work in machine translation and object detection, we introduce an attention based model that automatically learns to describe the content of images. We describe how we can train this model in a deterministic manner using standard backpropagation techniques and stochastically by maximizing a variational lower bound. We also show through visualization how the model is able to a...
متن کاملShow-and-Fool: Crafting Adversarial Examples for Neural Image Captioning
Modern neural image captioning systems typically adopt the encoder-decoder framework consisting of two principal components: a convolutional neural network (CNN) for image feature extraction and a recurrent neural network (RNN) for caption generation. Inspired by the robustness analysis of CNN-based image classifiers to adversarial perturbations, we propose Show-and-Fool, a novel algorithm for ...
متن کاملUprising in “Uprising”: A Multimodal Analysis of Bob Marley’s Lyrics
This paper investigates how the theme of uprising is conveyed in Bob Marley’s final music album by the name “Uprising”. Through the methodological lenses of multimodality, attention is focused on how the album cover design, lexical items, literary devices, and other aesthetic ways such as the titles of the ten songs of the album and their order of arrangement contribute to the overall theme of ...
متن کاملLearning Document Image Features With SqueezeNet Convolutional Neural Network
The classification of various document images is considered an important step towards building a modern digital library or office automation system. Convolutional Neural Network (CNN) classifiers trained with backpropagation are considered to be the current state of the art model for this task. However, there are two major drawbacks for these classifiers: the huge computational power demand for...
متن کاملStatistical Review of Theses in Iran Public Universities in Medical Imaging Field
Introduction: In spite of all the researches on medical imaging, this field is still hot. It is the third major area of research, conducted in the world. Unfortunately, there is no statistical evaluation available of the student theses in Iran. It does not seem like a policy is going to be established to address this issue. Providing a statistical insight to activities done in ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017